A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Authors

  • Iman Sahraei Dehmajnoonie Science and Research Branch, Islamic Azad University, kerman, Iran
  • Keivan Borna Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, IRAN
  • vahid Hajihashemi Student Member, IEEE
  • Zeinab Hassani Department of computer science, Kosar University of Bojnourd, Iran.
Abstract:

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large number of features to attend as they play an essential role in detection efficiency. In this article, we're working on a feature selection method to e-mail spam. This approach is considered a hybrid of optimization algorithms and classifiers in machine learning. Binary Whale Optimization (BWO) and Binary Grey Wolf Optimization (BGWO) algorithms are used for feature selection and K-Nearest Neighbor (KNN) and Fuzzy K-Nearest Neighbor (FKNN) algorithms are applied as the classifiers in this research. The proposed method is tested on the "SPAMBASE" datasets from UCI Machine learning Repesotries and the experimental results revealed the highest accuracy of 97.61% on this dataset. The obtained results indicateed that the proposed method is suitable and capable to provide excellent performance in comparison with other methods.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

full text

A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification

Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (e...

full text

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

full text

H-BwoaSvm: A Hybrid Model for Classification and Feature Selection of Mammography Screening Behavior Data

Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could...

full text

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 31  issue 2

pages  165- 173

publication date 2020-04-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023